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INCORPORATION BY REFERENCE 
[0001] This Application herein incorporates by reference: 
5 U.S. Patent Application Serial No. 1 0/78 1 ,443, entitled "Systems and Methods for 
Determining Predictive Models of Discourse Functions" by M. Azara et al.; 
U.S. Patent Application Serial No, 10/785,199, entitled "Systems and Methods for 
Synthesizing Speech Using Discourse Function Level Prosodic Features" by M. Azara 
et al.; 

10 U.S. Patent Application Serial No. 10/XXX,XXX, entitled "System and Methods for 
Resolving Ambiguity" by M. Azara et al., attorney docket # FX/A3007Q1/3 17004; 
U.S. Patent Application Serial No. 10/684,508, entitled "Systems and Methods for 
Hybrid Text Summarization" by L. POLANYI et al., each, in their entirety. 

BACKGROUND OF THE INVENTION 
15 1. Field of Invention 

[0002] This invention relates to natural language processing. 
2. Description of Related Art 

[0003] Natural language speech offers a nximber of advantages over 
conventional keyboard, tactile and other interfaces. Natural langiiage interfaces are 
20 among the earliest interfaces learned. Natural language interfaces are among the most 
intuitive interfaces for users which may reduce cognitive effort in accomplishing 
certain tasks. 

[0004] Many command and control systems, knowledge repositories and 
order taking systems already benefit from conventional natural language speech 
25 interfaces. However, these conventional information retrieval systems lack any notion 
of how to facilitate user interaction. Instead, these conventional systems impose a 
protocol of interaction on the user. Typically the interaction protocol is created by the 
interface designer and assumes that the task dialogue is the focus of the user's 
attention. 

30 [0005] In contrast, human-human dialogue uses prosodic features and 

sequences of discourse functions to indicate appropriate speech presentation 
strategies. For example, prosodic features and discourse function sequences indicate 
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the turn sequences in the exchange of information. Silences following speech 
utterances may be used by one dialogue participant to indicate an expected response 
from another participant. Changes in pitch frequency, rate of speech and/or other 
prosodic features can be used to indicate questions and/or interrogatory remarks. 
5 Similarly, patterns of speech, characteristic discourse functions sequences and/or other 
presentation strategies are used to indicate tums in the dialogue. 

[0006] Conventional human-computer natural language interfaces lack 
appropriate interaction models. That is, conventional natural language interfaces 
merely present speech information as requested and/or accept speech information 

10 when it is presented. A user of a conventional human-computer natural language 

interface must therefore devote cognitive resources to understanding the appropriate 
timing and turn taking necessary to interact with these conventional interfaces. 

[0007] For example, some information retrieval systems implementing these 
conventional natural language interfaces accept certain types of information only at 

15 specified time intervals or only in response to certain prompts. When an information 
request is received, these conventional information retrieval systems immediately 
schedule the presentation of the requested information. That is, these conventional 
systems assume that the human-computer dialogue is the primary dialogue. The 
immediate presentation of information can distract the focus of attention if the user is 

20 interacting with human and/or devices controlled by conventional natural language 
interfaces. The re-direction of the user's attention creates cognitive thrashing as the 
user attempts to switch between partially completed tasks. 

[0008] In human-human interactions, the competition for the user's focus of 
attention is mediated by the interaction model of the human speakers. However, when 

25 the user interacts with systems that lack appropriate interaction models, a greater 

cognitive load is imposed since the user is forced to follow the interaction model of 
the system. The user's attention is re-directed from completing the task, to the 
specifics of how to complete the task. These shifts in the focus of attention reduce the 
effectiveness of these conventional natund language interfaces. As the number of 

30 devices incorporating these conventional natural language interfaces increase, these 
focus of attention problems also increase. 
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SUMMARY OF THE INVENTION 
[0009] Thus, natural language interfaces that determine and adjust to the 
user's interaction model would be useful. Moreover, natural language interfaces that 
determine and adapt to the user's characteristic speech patterns would also be usefiil. 
The systems and methods according to this invention determine the discourse 
functions, prosodic features and turn information for a training corpus of speech 
information. Predictive interaction models are detennined based on the discourse 
functions and prosodic features associated with each identified tum in the training 
corpus. The discourse functions and prosodic features in non-training speech 
information are detennined. The dialogue tums are then predicted based on the 
predictive interaction model, the determined discourse functions and the prosodic 
features. Speech input and/or output is scheduled based on the predictive interaction 
model. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0010] Fig. 1 is an overview of an exemplary system for determining 
predictive interaction models according to one aspect of this invention; 

[0011] Fig. 2 is an exemplary method for determining predictive interaction 
models according to this invention; 

[0012] Fig. 3 shows an exemplary system for determining predictive 
interaction models according to this invention; 

[0013] Fig. 4 is an exemplary method of determining interactions according 
to one aspect of this invention; 

[0014] Fig. 5 is a exemplary system for determining interactions according 
to one aspect of this invention; and 

[0015] Fig. 6 is an exemplary data structure for storing predictive models 
according to one aspect of this invention. 

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS 

[0016] Fig. 1 is an overview of an exemplary system for determining 
predictive interaction models according to one aspect of this invention. The system 
for determining predictive interaction models 100 is connected via conununications 
link 99 to a system for determining interactions 200; an information repository 300; an 
intemet-enabled personal computer 400; an automatic speech recognition system 500; 
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and a sensor device 600, such as a microphone, a video capture system or any other 
device capable of capturing prosodically modulated natural language information. 

[0017] In one of the various exemplary embodiments according to this 
invention, a user of the internet-enabled personal computer 400 requests the 
determination of predictive interaction models based on tum annotated speech-based 
training instances contained in the information repository 300. It will be apparent that 
speech information includes, but is not limited to, orally encoded natural language 
speech, signed natural language gestures and/or hand, body, pen, and device gestures 
and/or any prosodically modulated natural language information. The speech-based 
training instances are forwarded over the communications links 99 to the automatic 
speech recognition system 500 where the recognized speech information is 
determined. The discourse functions in the recognized speech information and the 
prosodic features in each speech-based training instance are determined and associated 
with tum information to determine a predictive interaction model. 

[0018] In another exemplary embodiment according to this invention, a user 
of the internet-enabled personal computer 400 requests the retrieval of information 
contained in the information repository 300 using a speech request. The speech 
request is mediated by the system for determining interactions 200. The speech 
request is forwarded over the communications links 99 to the automatic speech 
recognition system 500 where the recognized speech information is determined. The 
recognized speech is forwarded over communications links 99 to the system for 
determining interactions 200 where a prediction of the likelihood of a tum is 
determined. The tum prediction is then returned to the internet-enabled personal 
computer 400. 

[0019] While the speech request is processed, the user of the internet- 
enabled personal computer 400 issues a file-save-as voice command to a speech- 
enabled word processing application. During the speech based file-save-as dialogue, 
the previously requested information becomes available. Although the two speech- 
based interactions use different speech processing systems, both of the speech 
processing systems use exemplary embodiments of the system for determining 
interactions 200 to schedule the presentation of information to the user. 

[0020] Thus, the requested information is not presented over the file-save-as 
dialogue. Instead, discourse functions and prosodic features are monitored and used 
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with a previously determined predictive interaction model to determine the 
appropriate turn and/or politeness interruption markers to be used in the dialogue. 

[0021] In another exemplary embodiment according to this invention, the 
user of microphone or sensor device 600 uses speech to request a temperature control 
5 system in the intemet-enabled personal computer 400 to set the house temperature to 
thirty degrees Celsius. However, the request to the temperature control system is 
immediately followed by speech from Jane. "Hi John, I see you've set the temperature 
for me, please set it back to twenty seven, I really don't mind the cold." The 
temperature control system prepares the confirmation message "The temperature has 

10 been set to thirty degrees Celsius" and requests a turn prediction from the system for 
determining interactions 200 over communications links 99, The turn prediction 
allows the temperature control system to avoid barging in over Jane's speech. Instead, 
the temperature control system uses the received tum prediction to delay scheduling 
any confirmation message until after Jane has finished speaking but before John is 

1 5 predicted to respond to Jane. 

[0022] In still other exemplary embodiments according to this invention, if 
an appropriate tum or interruption point is not predicted within a reasonable period 
and/or the presentation information is too critical to delay, the system for determining 
interactions adds politeness interruption markers to the presentation information. 

20 Politeness interruption markers include but are not limited to phrases such as "Excuse 
me but <presentation information>", "I hate to interrupt, however <presentation 
information>" and the like. For example, if the temperature confirmation information 
is critical information, then the system for determining interactions 200 adds 
politeness interruption markers to generate, schedule and/or present the phrase 

25 "Excuse me John, the temperature has now been set to thirty degrees Celsius". The 
politeness interruption marker alerts the dialogue participants to a possible dialogue 
discontinuity and allows the focus of the participants' attention to be selectively re- 
directed to the presentation information. 

[0023] The approximation of typical human-human dialogue interactions by 

30 the system for determining user interactions 200 reduces the cognitive load on 

dialogue participants. That is, the cognitive load of interacting with the system is 
reduced when the information is presented in a manner expected by the user. 
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[0024] Fig. 2 is an exemplary method for detemiining a predictive 
interaction model according to this invention. The process begins at step SIO and 
continues immediately to step S20. 

[0025] In step S20, a training corpus of annotated recognized speech 
5 information is determined. The annotated recognized speech information may be 
obtained from the automatic recognition of a corpus of speech information such the 
Switchboard Corpus of the Linguistic Data Consortium and the like. The training 
corpus of recognized speech information is aimotated vAih turn, discourse function 
and prosodic feature information. In various exemplary embodiments according to 

10 this invention, the discourse function information is based on a theory of discourse 
analysis. The turn information indicates when the speaker has invited other diedogue 
participants to continue the conversation. The turn taking information may be 
indicated using an extensible markup (XML) tag or any other type of indicator or 
marker. The tum information is typically encoded into the speech information 

1 5 manually by a human coder but may also be determined automatically based on the 
theory of discourse analysis determined in step S30. 

[0026] The theory of discourse analysis may include but is not limited to the 
Unified Linguistic Discourse Model (ULDM) of Polanyi et al., described in co- 
pending U.S. Patent Applciation Serial No. 10/684,508, herein incorporated by 

20 reference in its entirety. Rhetorical Structures Theory, or any other known or later 

developed theory of discourse analysis capable of determining discourse functions in 
the recognized speech information. 

[0027] The discourse fimctions are intra-sentential and/or inter-sentential 
phenomena that are used to accomplish task, text and interaction level discourse 

25 activities such as giving commands to systems, initializing tasks identifying speech 
recipients and marking discourse level structures such as the nucleus and satellite 
distinction described in Rhetorical Structures Theory, the coordination, subordination 
and N-aries, as described in the ULDM and the like. Thus, in some cases, the 
discourse constituent of the selected theory of discourse analysis may correlate with a 

30 type of discourse function. 

[0028] The discourse function information may be determined automatically 
or may be entered manually. For example, the ULDM discourse parser may be used 
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to automatically annotate or segment the recognized speech information into discourse 
functions, hi various other exemplary embodiments according to this invention, 
human coders segment the recognized speech information into discourse functions 
and/or annotate the recognized speech information with discourse functions codes or 
S tags. 

[0029] It will be apparent that the recognized speech information in the 
training corpus may reflect the speech of a large group of speakers, a specific speaker, 
a specific speech genre or any other known or statistically significant grouping of 
speech information without departing from the scope of this invention. After the 

10 theory of discourse analysis has been determined, control continues to step S40. 

[0030] In step S40, prosodic features in the corpus of speech information are 
determined. The prosodic features include but are not limited to initial pitch 
frequencies, silence durations, volume, stress, boundary tones, number of intonation 
boxmdaries, change in frequency and the like. After the prosodic features in the 

1 5 speech have been determined, control then continues to step S50. 

[0031] In step S50, a predictive interaction model is determined. The 
predictive interaction model associates the annotated turn information with identified 
prosodic features and discourse functions. The associations may be determined using 
statistics, machine learning, rules and/or any other known or later developed method. 

20 It will be apparent that in various other exemplary embodiments according to this 
invention, other sources of tum information may also be used. 

[0032] A predictive interaction model is then determined. The predictive 
interaction model accepts prosodic features and a current discourse function and 
returns the likelihood that the next discourse function is associated with a tum event 

25 in the dialogue. The tum indicator may be a binary yes/no indicator, a likelihood 

percentage or any other known or later developed indicator of the likelihood of a tum. 
It will be apparent that in various exemplary embodiments according to this invention 
a prior predictive interaction model is developed based on leading prosodic feature 
indicative of a tum. A prior predictive interactive model is useful in interactive 

30 settings. For example, a prior predictive model is useful in predicting when an 

interruption is appropriate or minimally intrusive based on the speech patterns and 
prosody of the user. 
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[0033] Posterior predictive interactive models are also developed based on 
leading and/or following prosodic feature indicators. Posterior predictive interaction 
models are helpfiil in non-interactive environments. For example, the posterior 
predictive interaction models are used to differentiate transcript attribution 
5 information for multiple speakers in a television program or other multi- speaker 

setting. The differentiation of multiple speakers is also useful in tailoring language 
models based on speaker identify and/or performing various other non-interactive 
processing tasks. The predictive interaction model is determined based on machine 
learning. Naive Bayes, decision trees and the like. The predictive interaction model is 
10 then saved to memory and/or used. Control then continues to step S60 and the 
process ends. 

[0034] Fig. 3 shovs^s an exemplary system for determining a predictive 
interaction model according to this invention. The system for determining predictive 
interaction models 100 is comprised of a memory 20; a processor 30; a prosodic 

1 5 feature determination circuit or routine 40; an interaction model determination circuit 
or routine 50; and a discourse analysis circuit or routine 60 each connected to the 
input/output circuit 10, and via communications link 99 to an information repository 
300; an intemet enabled personal computer 400; an automatic speech recognition 
system 500 and a sensor device 600 such as a microphone, a video capture device or 

20 any other device capable of capturing prosodically modulated natural language 
information. 

[0035] A user of an intemet-enabled personal computer 400 initiates a 
request to determine a predictive interaction model according to one aspect of this 
invention. The request is forwarded over the commimications links 99 and mediated 

25 by the system for determining predictive interaction models 100. 

[0036] The system for determining predictive interaction models 100 
retrieves turn annotated speech training instances from the information repository 300 
by activating input/output circuit 10. The system for determining predictive 
interaction models 100 forwards the turn annotated speech training instances to the 

30 automatic speech recognizer 500 where the annotated recognized speech is 

determined. The processor 30 retrieves the turn annotated recognized speech and 
stores it in the memory 20. 
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[0037] The processor 30 activates the prosodic feature determination circuit 
or routine 40. The prosodic feature determination circuit or routine 40 determines the 
prosodic features in the turn annotated speech training instances. For example, the 
initial pitch frequency, preceding and following silence durations, volume, change in 
5 pitch frequency, stress, number of intonational boundaries are identified in the turn 
annotated speech training instances. However, it should be apparent that any known 
or later developed prosodic feature usefril in determining interaction models may also 
be used without departing from the spirit or scope of this invention. 

[0038] The processor 30 activates the discourse analysis circuit or routine 60 

10 to determine discourse functions in the recognized speech information. The discourse 
analysis routine or circuit 60 uses a theory of discourse analysis such as the ULDM, 
Rhetorical Structures Theory or any other known or later developed method of 
determining the discourse frmctions in the recognized speech information. 

[0039] The processor 30 then activates the interaction model determination 

15 circuit or routine 50. The interaction model determination circuit or routine 50 uses 
Naive Bayes, decision trees, rules or any other known or later developed method for 
determining a predictive interaction model based on the identified prosodic featxires 
and the discourse fimctions in the aimotated recognized speech information. The 
predictive interaction model is then retimied to the user of the internet-enabled 

20 personal computer 400 for use and/or saved for future use. 

[0040] Fig. 4 is an exemplary method of determining interactions according 
to one aspect of this invention. The process begins at step SI 00 and immediately 
continues to step SI 10. 

[0041] In step S 110, the recognized speech information is determined. The 

25 recognized speech information may originate from a telephone, a microphone array 
monitoring a room, a passenger compartment of an automobile, a recorded transcript 
or any other known or later developed source of speech information. After the 
recognized speech information h£is been determined. Control continues to step SI 20 
where the theory of discourse analysis is determined. 

30 [0042] The theory of discourse analysis may be determined based on a user 

profile entry, a user selection forma dialogue box, or using any other known or later 
developed method. After the theory of discourse analysis has been determined, 
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control continues to step SI 30 where the discourse functions in the speech are 
determined. 

[0043] Li various exemplary embodiments according to this invention, the 
discourse functions in the speech are determined based on a theory of discourse 
5 analysis, a predictive model for discourse functions or any other method for 

determining discourse functions in the speech information. As discussed above, 
discourse functions in the speech information are intra-sentential and/or inter- 
sentential phenomena that are used to accomplish task, text and interaction level 
discourse activities such as giving commands to systems, initializing tasks identifying 
1 0 speech recipients and marking discourse level structures such as the nucleus and 
satellite distinction described in Rhetorical Structures Theory, the coordination, 
subordination and N-aries, as described in the ULDM and the like. After the 
discourse functions in the speech information have been determined, control continues 
tostepS140. 

1 5 [0044] In step S 1 40, the prosodic features in the speech information are 

determined. The prosodic features include but are not limited to initial pitch 
frequency, volume, location and/or duration of silences in the speech, changes in pitch 
frequency and the like. Control then continues to step 8150. 

[0045] The predictive interaction model is selected and/or determined in 

20 step SI 50. The predictive interaction model is determined based on a user profile, the 
language, genre or various other characteristics of the speech information. For 
example, in one of the various exemplary embodiments according to this invention, a 
voice-print sub-module identifies the voice of the user. A user specific interaction 
model is then loaded from a predictive interaction storage structure based on the 

25 determined identity of the user. In this way, idiolectic or user specific prosodic 
features and idiolectic or user specific information presentation techniques are 
identified and used to increase the accuracy of the interaction model. However, it 
should be apparent that speaker independent predictive interaction models can also be 
used without departing from the spirit or scope of this invention. After the predictive 

30 interaction model is determined, control continues to step SI 60. 

[0046] In step SI 60, the first discourse fimctions is selected for processing. 
Control then continues to step SI 70 where a turn is predicted based on the identified 
prosodic features and the determined discourse functions. 
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[0047] In one of the various exemplary embodiments according to this 
invention, an anterior/prior or leading predictive model is determined. The anterior 
predictive model uses prosodic features and discourse function information preceding 
the turn in the dialogue to predict the likelihood that the next discourse function 
5 occurs after a turn, hx other exemplary embodiments according to this invention, the 
predictive interaction model is a posterior or following interaction model. The 
posterior interaction model predicts the likelihood that a tum occurred in a prior 
sequence of discourse functions. After the tum prediction has been determined, 
control continues to step SI 80. 

1 0 [0048] hi step S 1 80, the tum prediction is used to process the requested 

responses. For example, in various exemplary embodiments according to this 
invention, the tum prediction is used to schedule the presentation of the requested 
information retrieved from an information repository, a voice response unit or other 
type of information source. In this way, the requested information is presented as part 

15 of the flow of the dialogue. Since the requested information is presented at an 
appropriate point and/or with an appropriate politeness interruption marker, the 
cognitive load associated with processing the information is reduced. In multi- 
participant and/or multi-device enviroimients, scheduling the presentation of 
requested information based on the tum prediction better integrates the requested 

20 information into the flow of the group dialogue. 

[0049] In still other exemplary embodiments according to this invention, the 
tum prediction may also be used to dynamically add politeness interruption markers 
such as "Excuse me, but" or other interruption phrases. These intermption phrases 
help focus the attention of the dialogue participants on the new information and 

25 inform participants if the new information requires immediate attention. The 

intermption markers may also be used to inform the dialogue participants that the 
information may not directly relate to the current dialogue topic. Similarly, politeness 
intermption markers may be eliminated when the tum prediction indicates a response 
is currently anticipated or not required by the language, urgency and/or other factors. 

30 The presentation of salient information at appropriate points in the dialogue helps 
reduces the cognitive load on the user. After a response has been prepared and/or 
scheduled based on the tum prediction, control continues to step SI 90. 
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[0050] In step SI 90, a determination is made whether there are additional 
discourse functions to be processed. If it is determined that there are additional 
discourse functions to be processed, control continues to step S200. In step S200, the 
next discourse function in the recognized speech information is determined. After the 
5 next discourse function has been determined, control jumps to step SI 70. The steps 
S170-S200 are then repeated until it is determined in step SI 90 that no additional 
discourse functions remain to be processed. Control then continues to step S210 and 
the process ends. 

[0051] Fig. 5 is a exemplary system for determining interactions 200 
10 according to one aspect of this invention. The system for determining interactions 

200 is comprised of a memory 110; a processor 120; a prosodic feature determination 
circuit or routine 130; predictive interaction model storage 140; and a discourse 
analysis routine or circuit 150, each connected via input/output circuit 105 to 
communications link 99 and to an information repository 300; an internet-enabled 
15 personal computer 400, an automatic speech recognition system 500; and a sensor 
600. 

[0052] A user of the sensor device 600 or other prosodically modulated 
natural language input device requests a change in the temperature from 25 to 30 
degrees Celsius. After the user of the sensor device 600 initiates the temperature 

20 change request, the user begins an interaction with another system controlled by a 
natural language interface. The speech information is forwarded via the 
communications links to the automatic speech recognition system 500. 

[0053] The processor 120 of the system for determining interactions 
activates the input/output circuit 100 to retrieve the speech information and the 

25 recognized speech from the automatic speech recognition system 500. The recognized 
speech information is stored in memory 110. 

[0054] The processor 120 then activates the discourse analysis routine or 
circuit 150 to determine the discourse functions in the recognized speech- The 
discourse analysis routine or circuit 150 may use a theory of discourse analysis such as 

30 the ULDM, Rhetorical Structures Theory and/or any known or later developed method 
of determining discourse functions in the speech information. 
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[0055] The processor 120 activates the prosodic feature determination 
circuit 130 to determine the prosodic features in the speech. The prosodic features 
may include, but are not limited to initial pitch frequency, volume, location and/or 
duration of silences in the speech, changes in pitch frequency and the like 

[0056] The processor 120 retrieves a predictive interaction model from the 
predictive interaction model storage structure 140, In various exemplary 
embodiments according to this invention, the predictive model is determined using 
machine learning, statistical analysis, based on rules or determined using any other 
method and stored in the predictive interaction model storage 140. However, it will 
be apparent that the predictive interaction model may also be dynamically determined 
without departing from the spirit or scope of this invention. 

[0057] The predictive interaction model is typically an incremental 
predictive interaction model that leams incrementally. That is, in various other 
exemplary embodiments according to this invention, the dynamic collaborative 
grounded truth feedback mechanisms of Thione et al., as described in co-pending co- 
assigned U.S. Patent Application Serial No. XX/XXX,XXX attomey docket No. 
FX/A3005-3 17002, entitled "Systems and Method for Collaborative Note-Taking", 
and filed February 2, 2004, are used to incrementally update the predictive interaction 
model. However, it will be apparent that static predictive interaction models may also 
be used in the practice of this invention. 

[0058] The processor 120 then applies the retrieved predictive interaction 
model to the determined discourse fimctions and the determined prosodic featxire 
information. The predictive interaction model retums a prediction. The value of the 
prediction reflects the likelihood that a turn between dialogue participants has 
occurred or will occur after the current discourse fiuiction. As discussed above, prior 
or leading prediction interaction models return prediction values useful in determining 
a dialogue turn in an interactive environment. Various other exemplary embodiments 
according to this invention use posterior or trailing prediction predictive interaction 
models, either eilone or in combination with the prior predictive interaction models. 
The prediction values retumed by the posterior or trailing predictive interaction 
models are helpful in dictation and other non-interactive and/or delay tolerant 
environments. Responses and/or further input are scheduled based on the determined 
prediction model. 
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[0059] Fig. 6 is an exemplary data structure for storing predictive models 
700 according to one aspect of this invention. The data structure for storing predictive 
interaction models 700 is comprised of a an optiond user identifier 710; a prosodic 
features portion 720; a preceding discourse function sequence portion 730; a trailing 
5 discourse function sequence portion 740; and a prediction portion 750, 

[0060] The optional user identifier 710 uniquely associates a specific user 
with a specific interaction model. The user identifier may be an alphanimieric 
identifier, a uniquely identified name or any other information that vmiquely identifies 
the user with a predictive interaction model. 

10 [0061] The prosodic feature portion 720 contains prosodic features 

associated with an exemplary predictive interaction model. For example, the silence 
following a discourse function is a prosodic feature that may indicate a tum in the 
dialogue. Similarly, the initial pitch of discourse functions in a sequence of one or 
more discourse functions, stress on one or more discourse function segments, or any 

1 5 other consistent prosodic feature may be used to predict tums in the dialogue. 

[0062] The preceding discourse function sequence portion 730 contains 
information about the sequence of discourse function types likely to precede any 
predicted tum. Speaker independent preceding discourse function sequences may be 
determined based on the grammar and syntax rules of the dialogue language. 

20 However, in various other exemplary embodiments according to this invention, 
variations of the presentation strategies required by the dialogue language may be 
based on the user, the genre and/or any other known or later developed speech 
characteristic. 

[0063] The discourse function sequences in the preceding discourse function 
25 sequences portion 730 are illustrated using a two letter code to represent each type of 
discourse function. Multiple sequences of preceding discourse functions may be 
associated with multiple prosodic features. For example, in one of the exemplary 
embodiments according to this invention, an AWK or PERL like pattern expression is 
used to associate one or more sequences of prosodic discourse functions with one or 
30 more sequences of preceding discourse functions. The symbol matches any type 
of discourse function. Hyphenated terms in brackets such as "(A1-A9)" represent any 
single code fi-om the set Al through A9. The symbol matches a single instance of 
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the code following the symbol. The use of the pattern expression notation described 
above is merely exemplary. Thus, it will be apparent that any known or later 
developed method of representing discourse function sequences and/or prosodic 
features may be used in the practice of this invention. 
5 [0064] The trailing discourse function sequence portion 740 contains 

information about the sequence of discourse function types likely to follow a predicted 
turn. The trailing discourse function sequences in the trailing discourse function 
sequences portion 740 are also illustrated using a two letter code to represent each 
type of discourse function and associate discourse function sequences with prosodic 

1 0 features. The prediction portion 750 reflects the likelihood that the predictive 
interaction model correctly predicts a tum. 

[0065] The first row of the data structure for storing predictive interaction 
models 700 contains the value "A13D" in the user id portion 710. This indicates that 
the predictive interaction model is associated with the user identified as "A13D". The 

15 predictive interaction model may utilize presentation strategies specific to the speech 
pattems of user "A13D" to predict turns in the dialogue. The user may be identified 
to the system for determining interactions using voice print recognition, a user login 
process or any other known or later developed method of identifying the user. 

[0066] The prosodic features portion 720 of the first row of the exemplary 

20 data structure for storing predictive interaction models 700 contains the value 

"DF[LAST].PF[SILENCE___FOLLOWING] > 0.10". This indicates that if the silence 
following the prosodic feature associated with the "LAST" discourse function in the 
sequence exceeds "0.10" seconds and the preceding discourse function sequence 
constraints are satisfied, then a tum is predicted with a 60% probability. 

25 [0067] The preceding discourse function portion 730 contains the value 

"(A1-A6)A1 A4;". This indicates that a characteristic three discourse function 
sequence precedes the tum. The discourse function sequence starts with any one 
discourse function from the set of discourse functions of type "Al through A6". The 
second and third preceding discourse functions in the preceding discourse sequence 

30 are discourse functions of type Al and A4. 

[0068] The trailing discourse function sequence 740 is empty. This 
indicates that the first row of the exemplary data structure for storing predictive 
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interaction models is associated with an anterior/ prior or leading predictive 
interaction model. Thus, since the turn prediction is not conditioned on one or more 
trailing discourse function sequences, the predictive interaction model is well suited to 
interactive environments. 
5 [0069] The prediction portion 750 of the exemplary data structure for storing 

predictive interaction models contains the value 0,60. This value indicates the 
likelihood of a tum if the prosodic and discourse function constramts are satisfied. 

[0070] The second row of the data structure for storing predictive interaction 
models 700 also contains the value "A13D" in the user id portion 710. This indicates 

10 that the predictive interaction model encoded in the second row is also associated with 
the user identified as "A13D". The predictive interaction model in the second row has 
values in the trailing discourse function sequence portion 740. This indicates that the 
predictive interaction model is a posterior predictive interaction model. Thus, the 
posterior predictive interaction model may be selected for transcription or other non- 

1 5 interactive tasks. 

[0071] The prosodic features portion 720 of the second row of the 
exemplary data structure for storing predictive interaction models 700 contains the 
value: 

{(DF[FIRST].PF[PITCH_FREQUENCY_INITIAL] > 1 10); (1) 
20 (DF[LAST].PF[STRESS].SEGMENT.LAST = TRUE);} 

[0072] This indicates that if the initial pitch frequency associated with the 
"FIRST" discourse function in the discourse function sequence is greater than 1 10 Hz 
and there is a stress on the last segment of the last discourse function, and the 
preceding and trailing discourse function sequence constraints are satisfied, then a 
25 tum is predicted with a probability of 75%. The tum prediction value is higher since 
this predictive interaction model is a posterior or trailing model that uses prior and 
posterior prosodic features and prior and posterior discourse function sequences. 

[0073] The preceding discourse function portion 730 contains the value 
"(A1-A8)A2A2A9; .Al A3-1-A9;". This indicates that either of two characteristic 
30 discourse function sequences precedes the predicted tum. The first and second 
sequence of discourse functions are separated by the ";" symbol. 
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[0074] The first discourse function sequence starts with any one of the 
discourse fiinctions fi-om the set A1-A8. The second and third discourse functions in 
the first sequence are discourse fiinctions of type "A2". The fourth discourse function 
in the first sequence is a discourse function of type "A9". 
5 [0075] The second discourse function sequence starts with the symbol. 

This symbol matches any type of discourse function. Thus, the second sequence 
reflects a pattem that matches the tail or ending of a string of discourse functions 
starting with a discourse function of any type. The second discourse function in the 
sequence is a discourse function of type "Al". The third discourse fimction in the 

10 sequence is a discourse function of type "A3". The symbol is used to indicate that 
at least one discourse fimction of type "A9" must follow. 

[0076] The trailing discourse fimction sequence portion 740 contains the 
value "ASAl". The presence of a trailing discourse function sequence portion 
indicates that the second row of the exemplary data structure for storing predictive 

1 5 interaction models is associated with posterior or trailing predictive interaction model. 
The predictive interaction model is not well suited to interactive environments since 
the turn prediction is conditioned a trailing discourse function sequence. The trailing 
discourse function sequence portion 740 requires that a discourse function of type 
"A8" immediately follow a tum followed by a discourse function of type "Al". 

20 [0077] The prediction portion 750 of the exemplary data structure for storing 

predictive interaction models contains the value 0.75. This value indicates the 
likelihood of a tum if the prosodic and discourse fimction constraints are satisfied. 

[0078] The last row of the data structure for storing predictive interaction 
models 700 contains the value "B24F" in the user id portion 710. This indicates that 

25 the predictive interaction model is associated with the user identified as "B24F". As 
discussed above, the user is identified to the system for determining interactions using 
voice print recognition, a user login process or any known or later developed method 
of identifying the user to the system for determining interactions. 

[0079] The prosodic features portion 720 of the last row of the exemplary 

30 data structure for storing predictive interaction models 700 contains the value 

"DF[LAST-1].PF[SILENCE_TRAILING] > 0.15". This indicates that if the "LAST- 
1" discourse function is followed by a trailing silence greater than "0.15" seconds and 
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the preceding and trailing discourse function sequence constraints are satisfied, then a 
turn is predicted with a probability of 64%. 

[0080] The preceding discourse function portion 730 contains the value 
"(A1-A6)A1 A4". This indicates that a characteristic three discourse function 
5 sequence precedes the turn. The discourse function sequence starts with any one of 
discourse function from the set of discourse functions "Al through A6". The second 
and third preceding discourse functions in the preceding discourse sequence are 
discourse functions of type Al and A4. 

[0081] The trailing discourse function sequence 740 contains the value 

10 "A3.A2'\ This indicates that the interaction model is a posterior or trailing interaction 
model not well suited to interactive environments. The symbol matches with any 
type of discourse function. Thus, the three discourse function sequence starts with a 
discourse functions of type "A3" followed by a discourse function of any type and in 
tum followed by a discourse function of type "A2". 

1 5 [0082] The prediction portion 750 of the exemplary data structure for storing 

predictive interaction models contains the value "0.64" indicating the likelihood of a 
tum if the prosodic and discourse function constraints are satisfied. 

[0083] Each of the circuits 10-60 of the system for determining predictive 
interaction models 100 described in Fig. 3 and circuits 100-150 of the system for 

20 determining interactions 200 described in Fig. 5 can be implemented as portions of a 
suitably programmed general-purpose computer. Alternatively, circuits 10-60 and 
100-150 outlined above can be implemented as physically distinct hardware circuits 
within an ASIC, or using a FPGA, a PDL, a PLA or a PAL, or using discrete logic 
elements or discrete circuit elements. The particular form each of the circuits 10-60 of 

25 the system for determining predictive interaction models 100 and 100-150 of the 

system for determining interactions 200 as outlined above will take is a design choice 
and will be obvious and predicable to those skilled in the art. 

[0084] Moreover, the system for determining predictive interaction models 
100 and the system for determining interactions 200 and/or each of the various circuits 

30 discussed above can each be implemented as software routines, managers or objects 
executing on a progranMied general purpose computer, a special purpose computer, a 
microprocessor or the like. In this case, the system for determining predictive 
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interaction models 100 and the system for determining interactions 200 and/or each of 
the various circuits discussed above can each be implemented as one or more routines 
embedded in the communications network, as a resource residing on a server, or the 
like. The system for determining predictive interaction models 100 and the system for 
5 determining interactions 200 and the various circuits discussed above can also be 
implemented by physically incorporating the system for determining a predictive 
interaction model 100 and/or the system for determining interactions 200 into 
software and/or a hardware system, such as the hardware and software systems of a 
web server or a client device. 

1 0 [0085] As shown in Figs. 3 and 5, memory 20 and 1 1 0 can be implemented 

using any appropriate combination of alterable, volatile or non- volatile memory or non- 
alterable, or fixed memory. The alterable memory, whether volatile or non- volatile, can 
be implemented using any one or more of static or dynamic RAM, a floppy disk and 
disk drive, a write-able or rewrite-able optical disk and disk drive, a hard drive, flash 

1 5 memory or the like. Similarly, the non-alterable or fixed memory can be implemented 
using any one or more of ROM, PROM, EPROM, EEPROM, an optical ROM disk, 
such as a CD-ROM or DVD-ROM disk, and disk drive or the like. 

[0086] The communication links 99 shown in Figs. 1, 3 and 5 can each be 
any known or later developed device or system for connecting a communication 

20 device to the system for determining predictive interaction models 100 and the system 
for determining interactions 200, including a direct cable connection, a connection 
over a wide area network or a local area network, a connection over an intranet, a 
connection over the Intemet, or a connection over any other distributed processing 
network or system. In general, the conmiunication links 99 can be any known or later 

25 developed connection system or structure usable to connect devices and facilitate 
communication. 

[0087] Further, it should be appreciated that the communication links 99 can 
be wired or wireless links to a network. The network can be a local area network, a 
wide area network, an intranet, the Internet, or any other distributed processing and 
30 storage network. 

[0088] While this invention has been described in conjunction with the 
exemplary embodiments outlined above, it is evident that many alternatives, 
modifications and variations will be apparent to those slcilled in the art. Accordingly, 
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the exemplary embodiments of the invention, as set forth above, are intended to be 
illustrative, not limiting. Various changes may be made without departing from the 
spirit and scope of the invention. 



